Comparing common and rare single variant and gene aggregate instrumentation strategies for molecular MR

Author

Aimee Hanson

Published

January 12, 2026

Introduction

Classically, Mendelian Randomisation (MR) methods utilising trait-associated genetic variants from GWAS studies have employed common polymorphisms (population MAF > 1%) targeted by genotyping chips, or reliably imputed from reference populations, to instrument modifiable exposures. However, common variants tested in GWAS typically explain a very small fraction of the variability in a measured trait, potentially exhibit pleiotropic effects acted upon by balancing selection or as a consequent of genetic linkage, and are rarely causal. Rare variants, which typically show large biological effects (e.g. through abolishing protein expression) provide a means of more unambiguously instrumenting relevant molecular processes. Comparison of causal estimates derived using differing methods of genetically instrumenting modifiable exposures may enhance the interpretation of the biological mechanisms underlying exposure-outcome relationships. This includes using variants from across the allele frequency spectrum, but also leveraging rare variant aggregate approaches to instrument gene-level perturbations in expression and function.

Here, caual estimates for pairwises combinations of 2937 molecular exposures (O-link plasma proteins) and 11 health related compelex traits (below) have been derived using nine instrumentation strategies, with further instrument splitting into cis (variants within 1MB of the encoding gene start and end site), trans and full variant sets (a total of 27 instrument sets per exposure).

Outcomes: Low density lipoprotein levels (LDL), Body Mass Index (BMI), Vitamin D (VitD), Triglycerides (Trig), Glycated Haemoglobin (HbA1c), Mean Platelet Volume (PlatVol), IGF-1 (IGF1), Waist-to-Hip Ratio (WHRadjBMI), Red Blood Cell Erythrocyte Count (RBCCount), Mean Corpuscular Volume (CorpVol) and Systolic Blood Pressure (SBP)

Instrumental variable (IV) sets

Nine sets of IVs for each protein exposure have been extracted from across two studies:
GWAS common variants: “Plasma proteomic associations with genetics and health in the UK Biobank” (Sun, B. 2023, Nature).
WES variants/gene aggregate tests: “Rare variant associations with plasma protein levels in the UK Biobank” (Dhindsa, R. 2023, Nature).

Common GWAS
Common genome-wide fine-mapped variants from the GPMAP (>1% MAF) (cis, trans and combined)
Common genome-wide clumped and pruned variants (>1% MAF) (cis, trans and combined)

WES (variants)
Common exome-wide (>1% MAF) (cis, trans and combined)
Rare exome-wide (0-1% MAF) (cis, trans and combined)
Ultra-rare exome-wide (0-0.1% MAF) (cis, trans and combined)
LD clumping was performed for common variants (for which LD matrices have been derived) only. For rare variants, instrument sets were stored both pre and post filtering to the top associated variant per gene for sensitivity analyses.

WES (masks)
pLOF burden mask (cis, trans and combined)
missense/low-confidence burden mask (cis, trans and combined)
pLOF/missense/low-confidence burden mask (cis, trans and combined)
synonymous burden mask (cis, trans and combined)

Data import

For direct comparison of the performance of instrument sets, the theshold for pQTL significance has been set at 5x10^-8 for both rare and common variants. MR results for all exposure-outcome pairs, derived using differing IV sets, and the harmonised summary statistics of those variant sets, are read in below.

Read in MR results from scripts/04_analysis_proteins/001_performmolecularMR.R and harmonised summary statistics:

Code
files <- list.files(res_dir, pattern = "results_molecular_MR_GW5e-08_EW5e-08", full.names = T)
files_snps <- list.files(file.path(data_dir, "harmonised"), pattern = "harmonised_studies_GW5e-08_EW5e-08_[A-Z]", full.names = T)

# Import MR results
res_MR <- list()
for(i in 1:length(files)){
  load(files[i])
  res_MR[[i]] <- results_MR
}
names(res_MR) <- sub(".*08_(.*)\\.rda","\\1", files)

# Import harmonised SNP data <- list()
harmonised_snps <- list()
for(i in 1:length(files_snps)){
  load(files_snps[i])
  harmonised_snps[[i]] <- harmonised_studies
}
names(harmonised_snps) <- sub(".*08_(.*)\\.rda","\\1", files_snps)

rm(harmonised_studies)

# 0.05/2937 proteins tested per outcome
alpha <- signif(0.05/length(res_MR$BMI),2)

# IVsets
IVsets <- c("dhindsa_exwas_mask_ptv" = "mask:pLOF",
            "dhindsa_exwas_mask_raredmg" = "mask:missense",
            "dhindsa_exwas_mask_ptvraredmg" = "mask:pLOF|missense",
            "dhindsa_exwas_mask_syn" = "mask:synonymous",
            "dhindsa_exwas_variant_ultrarare" = "exome:ultrarare",
            "dhindsa_exwas_variant_ultrarare_filt" = "exome:ultrarare (filtered)",
            "dhindsa_exwas_variant_rare" = "exome:rare",
            "dhindsa_exwas_variant_rare_filt" = "exome:rare (filtered)",
            "dhindsa_exwas_variant_common" = "exome:common (clumped)",
            "sun_gwas_variant_common_clumped" = "genome:common (clumped)",
            "sun_gwas_variant_common_finemapped" = "genome:common (fmapped)")

cols_IVsets <- c("darkorchid4","darkorchid3","darkorchid2","orchid",
                 "steelblue4","steelblue4","steelblue1","steelblue1","darkorange2","orange","orange")
names(cols_IVsets) <- IVsets

# Variant-based instrument sets
variant_IVsets <- c("dhindsa_exwas_variant_ultrarare" = "exome:ultrarare",
                    "dhindsa_exwas_variant_ultrarare_filt" = "exome:ultrarare (filtered)",
                    "dhindsa_exwas_variant_rare" = "exome:rare",
                    "dhindsa_exwas_variant_rare_filt" = "exome:rare (filtered)",
                    "dhindsa_exwas_variant_common" = "exome:common (clumped)",
                    "sun_gwas_variant_common_clumped" = "genome:common (clumped)",
                    "sun_gwas_variant_common_finemapped" = "genome:common (fmapped)")

cols_variant_IVsets <- c("steelblue4","steelblue4","steelblue1","steelblue1","darkorange2","orange","orange")
names(cols_variant_IVsets) <- variant_IVsets

# Filter variant-based instruments 
variant_IVsets_indep <- c("dhindsa_exwas_variant_ultrarare_filt" = "exome:ultrarare (filtered)",
                          "dhindsa_exwas_variant_rare_filt" = "exome:rare (filtered)",
                          "dhindsa_exwas_variant_common" = "exome:common (clumped)",
                          "sun_gwas_variant_common_clumped" = "genome:common (clumped)",
                          "sun_gwas_variant_common_finemapped" = "genome:common (fmapped)")

Instrument characteristics

Comparison of IV sets

The average number of genetic instruments (pQTLs) per protein across instrument sets

Rare and common variant instruments (pQTLs) have been partitioned according to their proximity to the instrumented protein-coding gene into cis (+- 1MB from gene boundaries) and trans (>1MB distant) sets. A similar number of cis-IVs per protein can be derived from each of the IV sets (typically <5). Proteins that are instrumented with >50 rare cis variants fall in gene dense genomic regions. Available IVs drop when restricting to one rare variant per annotated gene, but rare IVs have not been explicitly LD pruned so it is possible that in some instances these variants are not independent. More trans-IVs are availble in the common-varaint instrument sets (both GWAS and ExWAS derived), suggesting there are fewer rare and ultra-rare variants that associate with protein levels in trans.

Code
# Number of instruments per protein for each IV set

num_IVs <- lapply(res_MR, function(x){
  df_out <- data.frame(expand.grid(IV_set = IV_sets, cis_trans = c("cis","trans")))
  
  for(i in 1:length(x)){
    df <- x[[i]] |> dplyr::summarise(unique(nsnp), .by = c(IV_set, cis_trans))
    colnames(df)[3] <- names(x[i])
    df_out <- left_join(df_out, df, by = c("IV_set","cis_trans"))
  }
  
  df_out <- df_out |> tidyr::pivot_longer(cols = 3:length(df_out)) |> na.exclude()
  return(df_out)
})

num_IVs <- do.call("rbind", num_IVs) |> dplyr::mutate(outcome = sub(".*_","",name))

# Plot
p1 <- mapply(FUN = function(x,y){
  x <- x |> dplyr::filter(cis_trans == y) |>
    dplyr::mutate(label = factor(IVsets[IV_set], levels = IVsets)) |>
    dplyr::filter(label %in% grep("exome|genome", IVsets, value = T))
  
  ggplot(x, aes(x = label, y = value, fill = label)) +
    facet_grid(.~outcome) +
    stat_boxplot(geom ='errorbar', width = 0.2) + 
    geom_boxplot(width = 0.7) +
    scale_y_log10() +
    scale_fill_manual(values = cols_IVsets) +
    theme_minimal() +
    theme(axis.text.x = element_text(angle = 55, hjust = 1),
          legend.position = "none",
          plot.title = element_text(hjust = 0.5)) +
    labs(x = "IV set",
         y = "Num. instruments", 
         title = paste0("Number of ",y ," instruments per plasma protein"))
  
}, x = list(num_IVs,num_IVs), y = c("cis","trans"), SIMPLIFY = FALSE)

gridExtra::grid.arrange(grobs = p1)
Figure 1

The number of proteins able to be instrumented with pQTLs across instrument sets

Note, only instrument sets including >=3 independent variants are considered.

Although the power to detect highly significant rare-variant associations is expected to be lower than that for common variants due to to low alternate allele counts, an approximately equivalent percentage (~25%) of the plasma protein can be instrumented using GWAS common or ExWAS rare variants in cis (and ~10-12% with ultra-rare variants). ExWAS common variants provide less coverage, likely because fewer independent cis-variants will typically be found in the coding region of a gene than within the 1MB surrounding putative regulatory region captured by GWAS. Alternatively, over 50% of the proteome can be instrumented with ExWAS and GWAS derived common variants relative to <10% with rare and ultra-rare variants. This again suggests there are fewer (or less readily detected) rare trans-variant associations with plasma protein levels. This is slightly counterintuitive, as it would be anticipated that stronger rare varaint cis-effects on protein levels would also be detected as strong trans associations with proteins in interacting pathways.

What are some potential explanations for this?:
- The common variants that exist at intermediate frequencies in a population (i.e. have not reached fixation) are acted upon by competing balancing selective forces and show a greater number of trans assocaitions across diverse biological processes (i.e. are more plieotropic). Converse to this is the idea that rare variants can only be observed in genomic regions where the DNA sequence has been evolutionarially optimised (conserved) for a highly specific biological mechanism.
- The nature of the genes instrumented by common and rare variants is fundamentally different. For example the proteins instrumented by common cis-IVs are more often involved in the regulation of diverse cellular pathways (e.g. variants in enhancers impacting transcriptional or cell signalling regulation), where strong-effect rare variants modify protein levels or function directly, with consequences for a smaller numebr of interacting/downstream proteins?

Questions:
- What is the relative strength (betas) of cis and trans, rare and common pQTL effects?
- What is the intersection of the plasma proteins that have rare and common IVs available (significant pQTL associations)?

Code
# Number of proteins with significant causal estimates across instrument sets
#summary_sig <- lapply(res_MR, summarise_results, p_thresh = alpha, n_snps = 3)
#summary_sig_cis <- lapply(res_MR, summarise_results, p_thresh = alpha, n_snps = 3, cis_trans = "cis")
#summary_sig_trans <- lapply(res_MR, summarise_results, p_thresh = alpha, n_snps = 3, cis_trans = "trans")

#save(summary_sig, file = file.path(res_dir, "analysis", "IVWsummary_all_molecular_MR_GW5e-08_EW5e-08_min3IVs.rda"))
#save(summary_sig_cis, file = file.path(res_dir, "analysis", "IVWsummary_cis_molecular_MR_GW5e-08_EW5e-08_min3IVs.rda"))
#save(summary_sig_trans, file = file.path(res_dir, "analysis", "IVWsummary_trans_molecular_MR_GW5e-08_EW5e-08_min3IVs.rda"))

load(file.path(res_dir, "analysis", "IVWsummary_all_molecular_MR_GW5e-08_EW5e-08_min3IVs.rda")) #summary_sig
load(file.path(res_dir, "analysis", "IVWsummary_cis_molecular_MR_GW5e-08_EW5e-08_min3IVs.rda")) #summary_sig_cis
load(file.path(res_dir, "analysis", "IVWsummary_trans_molecular_MR_GW5e-08_EW5e-08_min3IVs.rda")) #summary_sig_trans

# Number of proteins with instrument sets (of >= n_snps) available
avail_IVs <- do.call("rbind",lapply(summary_sig, function(x){apply(x$min_p, MARGIN = 2, function(y){sum(!is.na(y))})/nrow(x$min_p) * 100}))
avail_IVs_cis <- do.call("rbind",lapply(summary_sig_cis, function(x){apply(x$min_p, MARGIN = 2, function(y){sum(!is.na(y))})/nrow(x$min_p) * 100}))
avail_IVs_trans <- do.call("rbind",lapply(summary_sig_trans, function(x){apply(x$min_p, MARGIN = 2, function(y){sum(!is.na(y))})/nrow(x$min_p) * 100}))

# Plot
p2 <- mapply(FUN = function(x,y){
  x[,names(variant_IVsets)] |> 
    as.data.frame() |> 
    tibble::rownames_to_column("outcome") |>
    tidyr::pivot_longer(cols = 2:8) |>
    dplyr::mutate(label = factor(variant_IVsets[name], levels = variant_IVsets)) |>
    ggplot(aes(x = label, y = value, fill = label)) +
    facet_grid(.~outcome) +
    geom_col() +
    scale_fill_manual(name = "", values = cols_variant_IVsets) +
    scale_y_continuous(limits = c(0,100)) +
    theme_minimal() +
    theme(axis.text.x = element_text(angle = 55, hjust = 1),
          plot.title = element_text(hjust = 0.5),
          legend.position = "none") +
    labs(x = "IV set",
         y = "%", 
         title = paste0("Percentage of plasma proteins with ",y, "IV sets available (with n>=3 instruments)"))
}, x = list(avail_IVs, avail_IVs_cis, avail_IVs_trans), y = c("", "cis-", "trans-"), SIMPLIFY = FALSE)

gridExtra::grid.arrange(grobs = p2)
Figure 2

Instrument overlap

What is the overlap in the proteins that are instrumented (using >= 3 variants) by differing instrumentation strategies?

Code
exposures_cis <- data.frame(protein = sub("_BMI","",rownames(summary_sig_cis$BMI$min_p)),
                            dhindsa_exwas_variant_ultrarare = !is.na(summary_sig_cis$BMI$min_p[,"dhindsa_exwas_variant_ultrarare"]),
                            dhindsa_exwas_variant_rare = !is.na(summary_sig_cis$BMI$min_p[,"dhindsa_exwas_variant_rare"]),
                            dhindsa_exwas_variant_common = !is.na(summary_sig_cis$BMI$min_p[,"dhindsa_exwas_variant_common"]),
                            sun_gwas_variant_common_clumped = !is.na(summary_sig_cis$BMI$min_p[,"sun_gwas_variant_common_clumped"]))

exposures_trans <- data.frame(protein = sub("_BMI","",rownames(summary_sig_trans$BMI$min_p)),
                              dhindsa_exwas_variant_ultrarare = !is.na(summary_sig_trans$BMI$min_p[,"dhindsa_exwas_variant_ultrarare"]),
                              dhindsa_exwas_variant_rare = !is.na(summary_sig_trans$BMI$min_p[,"dhindsa_exwas_variant_rare"]),
                              dhindsa_exwas_variant_common = !is.na(summary_sig_trans$BMI$min_p[,"dhindsa_exwas_variant_common"]),
                              sun_gwas_variant_common_clumped = !is.na(summary_sig_trans$BMI$min_p[,"sun_gwas_variant_common_clumped"]))

upset_cis <- list(
  "exome:ultrarare" = exposures_cis$protein[exposures_cis$dhindsa_exwas_variant_ultrarare],
  "exome:rare" = exposures_cis$protein[exposures_cis$dhindsa_exwas_variant_rare],
  #"`exome:common (clumped)`" = exposures_cis$protein[exposures_cis$dhindsa_exwas_variant_common],
  "`genome:common`" = exposures_cis$protein[exposures_cis$sun_gwas_variant_common_clumped])

upset_trans <- list(
  "exome:ultrarare" = exposures_trans$protein[exposures_trans$dhindsa_exwas_variant_ultrarare],
  "exome:rare" = exposures_trans$protein[exposures_trans$dhindsa_exwas_variant_rare],
  #"`exome:common (clumped)`" = exposures_trans$protein[exposures_trans$dhindsa_exwas_variant_common],
  "`genome:common`" = exposures_trans$protein[exposures_trans$sun_gwas_variant_common_clumped])

# Upset plots
upset(fromList(upset_cis), order.by = "freq", text.scale = 1.5, mainbar.y.label = "cis-instrumented \nproteins")

Code
upset(fromList(upset_trans), order.by = "freq", text.scale = 1.5, mainbar.y.label = "trans-instrumented \n proteins")

The strength of IV associations

What are the relative effect sizes of instruments (pQTLs) in the different IV sets?

First pull the pQTL associations for each IV set into a consistant format:

Code
# Pull exposure associations from harmonised SNP sets
n_genes <- length(harmonised_snps[[1]]) # 2937
names_genes <- sub("_.*", "", names(harmonised_snps[[1]]))
n_IVsets <- length(harmonised_snps[[1]][[1]]) #11 IV sets
names_IVsets <- names(harmonised_snps[[1]][[1]])

# pqtls <- lapply(seq_len(n_genes), function(gene){
#   out <- lapply(seq_len(n_IVsets), function(set){
# 
#     dfs <- lapply(harmonised_snps, function(x){
#       df <- x[[gene]][[set]]
#       if("gene.outcome" %in% colnames(df)){
#         df$gene <- df$gene.outcome
#       }else{df$gene <- toupper(df$SNP)}
#       
#       if(!("chr.exposure" %in% colnames(df))){
#         df$chr.exposure = sub(":.*","",df$variant)
#         df$pos.exposure = sub(".*:(.*)_.*_.*","\\1",df$variant)
#       }
# 
#       df <- df |> dplyr::select(SNP,
#                                 CHR = chr.exposure,
#                                 POS = pos.exposure,
#                                 EA = effect_allele.exposure,
#                                 OA = other_allele.exposure,
#                                 EAF = eaf.exposure,
#                                 beta = beta.exposure,
#                                 P = pval.exposure,
#                                 exposure,
#                                 cis_trans,
#                                 proximal_gene = gene)
#       return(df)
#     })
# 
#     dfs_join <- do.call("rbind", dfs) |> unique()
#     rownames(dfs_join) <- NULL
#     dfs_join
#   })
#   names(out) <- names_IVsets
#   out
# })
# names(pqtls) <- names_genes

#save(pqtls, file = file.path(data_dir,"sumstats/proteomics/protein_pQTLs_5e-08.rda"))

load(file = file.path(data_dir,"sumstats/proteomics/protein_pQTLs_5e-08.rda")) #pqtls
# 
# # Significant pQTLs, combined across all exposures, for each instrument sets
# pqtls_joined <- lapply(seq_len(n_IVsets), function(set){
#   dfs <- lapply(pqtls, function(x){
#     out <- x[[set]]
#     out$IVset <- names_IVsets[set]
#     return(out)
#   })
#   dfs_join <- do.call("rbind", dfs) |> dplyr::arrange(SNP)
#   rownames(dfs_join) <- NULL
#   return(na.exclude(dfs_join))
# })
# 
# names(pqtls_joined) <- names_IVsets
# 
# # Derive a set of variants that are pQTLs for at least one protein to look up in the raw summary statistics of all others (split by IV sets)
# snp_lookup <- lapply(pqtls_joined, function(x){
#   unique(x$SNP)
# })
# # Format as in summary stats
# snp_lookup[grepl("mask", names(snp_lookup))] <- lapply(snp_lookup[grepl("mask", names(snp_lookup))], function(x){toupper(x)})
# snp_lookup[grepl("exwas_variant", names(snp_lookup))] <- lapply(snp_lookup[grepl("exwas_variant", names(snp_lookup))], 
#                                                                 function(x){gsub(":","-",gsub("_","-",toupper(x)))})
# #save(snp_lookup, file = file.path(data_dir,"sumstats/proteomics/protein_pQTLstolookup_5e-08.rda"))
# # Variant lookup performed as in 002_lookuppQTLs.R (on P1 server)
Code
pqtls_joined <- sapply(IV_sets, function(x){do.call(rbind, lapply(pqtls, `[[`, x))} |> na.exclude(), simplify = F)
pqtls_joined_variants <- do.call("rbind", pqtls_joined[grep("variant", names(pqtls_joined))]) |>
  tibble::rownames_to_column("IV_set") |>
  dplyr::mutate(IV_set = sub("\\..*","",IV_set))

p3 <- pqtls_joined_variants |> 
  dplyr::mutate(IV_set = factor(IVsets[IV_set], levels = IVsets)) |> 
  ggplot(aes(x = IV_set, y = abs(beta))) +
  geom_boxplot(aes(fill = IV_set)) +
  scale_fill_manual(values = cols_IVsets) +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 55, hjust = 1),
        legend.position = "none") +
  labs(x = "", y = "Instrument |exposure beta|")

p4 <- pqtls_joined_variants |> dplyr::select(-IV_set) |> unique() |>
  dplyr::mutate(MAF = ifelse(EAF>0.5, 1-EAF,EAF),
                beta_MA = ifelse(EAF>0.5, beta*-1, beta)) |>
  ggplot(aes(x = MAF , y = beta_MA)) +
  geom_point(size = 0.5, alpha = 0.3) +
  geom_vline(xintercept = c(0.01,0.001), col = "red", lty = 2, lwd = 0.5) +
  theme_minimal() +
  labs(x = "MAF", y = "Instrument exposure beta")

gridExtra::grid.arrange(p3,p4,nrow = 1, widths = c(0.2,0.7), top = "Observed effect size of IVs for protein expression across instrument sets")

Pleiotropy profiles

Cis and trans plieotropy profiles

On average, how many molecular traits (protein levels) does a variant significantly associate with across instrument sets (e.g. ultra-rare, rare, common)?

To assess the pleiotropy profiles of variants in each IV set, all significant pQTL associations reported with a given variant (p<=5e-8) were extracted from the O-link protein GWAS/ExWAS summary statistics.

Code
lookup_files <- list.files(file.path(data_dir, "sumstats/proteomics"), pattern = "pQTL_lookup_*", full.names = TRUE)

lookup_masks_files <- list.files(file.path(data_dir, "sumstats/proteomics"), pattern = "pQTL_lookup_dhindsa_exwas_mask_*", full.names = TRUE)
lookup_exome_files <- list.files(file.path(data_dir, "sumstats/proteomics"), pattern = "pQTL_lookup_dhindsa_exwas_variant_*", full.names = TRUE)
lookup_genome_files <- list.files(file.path(data_dir, "sumstats/proteomics"), pattern = "pQTL_lookup_sun_gwas_variant_*", full.names = TRUE)

# Pleiotropy matrices (for each IV set)
# How many proteins do IVs associated with?
lookup_pQTLs <- lapply(lookup_files, function(x){data.table::fread(x)})
names(lookup_pQTLs) <- sub(".*pQTL_lookup_(.*)\\.tsv","\\1",lookup_files)

# Format to keep significant variant-protein associations
lookup_pQTLs <- lapply(lookup_pQTLs, function(x){
  x <- as.data.frame(x)
  if(!("GENE" %in% colnames(x))){x$GENE <- NA}
  
  out <- data.frame(variant = sub(".*\\.tsv:","",x[,1]),
                    protein_target = x$protein_target,
                    proximal_gene = x$GENE,
                    pvalue = x[,which(colnames(x) %in% c("P","pValue"))],
                    cis_trans = x$cis_trans) |>
    dplyr::arrange(variant)
  
  return(out)})

# Retain associations at 5x10^-8 (note gene masks are already thresholded at 1x10^-8)
lookup_pQTLs <- lapply(lookup_pQTLs, function(x){
  x |> dplyr::filter(pvalue <= 5e-8 & !(cis_trans == "")) # Exclude composite protein targets with no cis_trans assignment
})

# Summarise associations per variant (pQTL)
pQTLs_assocs <- mapply(function(x, y){
  x <- x |> dplyr::group_by(variant, proximal_gene) |> 
    dplyr::summarise(n_assocs = n(), 
                     n_cis = length(which(cis_trans == "cis")),
                     n_trans = length(which(cis_trans == "trans")))
  x$IV_set <- y
  return(x)
}, x = lookup_pQTLs, y = as.list(names(lookup_pQTLs)), SIMPLIFY = F)
Code
pqtls_assocs_joined <- do.call("rbind", pQTLs_assocs) |>
  dplyr::filter(grepl("variant",IV_set)) |>
  dplyr::mutate(IV_set = factor(IVsets[IV_set], levels = IVsets)) |>
  dplyr::group_by(IV_set) |> 
  dplyr::mutate(label = ifelse(n_trans == max(n_trans) & grepl("exome",IV_set), paste(variant,proximal_gene), NA)) |> dplyr::ungroup()

p5 <- pqtls_assocs_joined  |>
  ggplot(aes(x = IV_set, y = n_assocs)) +
  geom_boxplot(aes(fill = IV_set)) +
  scale_fill_manual(values = cols_IVsets) +
  scale_y_log10() +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 55, hjust = 1),
        legend.position = "none") +
  labs(x = "", y = "Number of pQTL associations")

p6 <- pqtls_assocs_joined  |>
  ggplot(aes(x = n_cis, y = n_trans, colour = IV_set)) +
  facet_wrap(.~IV_set, nrow = 2) +
  geom_point() +
  ggrepel::geom_text_repel(aes(label = label), size = 3) +
  scale_colour_manual(values = cols_IVsets) +
  theme_minimal() +
  theme(legend.position = "none") +
  labs(x = "Number of cis-pQTL associations", y = "Number of trans-pQTL associations")

gridExtra::grid.arrange(p5,p6,nrow = 1, widths = c(0.2,0.7), top = "Pleiotropy profile of IV sets")

Causal effect estimates

Proportion of instrumented proteins with significant IVW causal effects detected across instrument sets

Restricting to only those proteins that could be successfully instrumented (with n >= 3 IVs) by all strategies shown below, a significant causal effect of protein levels on a given outcome tend to be more often detected using common cis-IVs from GWAS than rare and ultra-rare cis-IVs. Here, significance has been defined at two thresholds, the liberal threshold of p<=0.05 and the conservative Bonferroni adjusted significance threshold given n=2937 independent proteins tested (p<=1.7x10^-5).

Code
# Calculate the proportion of significant exposures across variant sets
# Keep proteins instrumentable by all variant-based IV sets
prop_sig_IVs <- function(res_summary, p_thresh){
  
  df_p <- res_summary$min_p[,names(variant_IVsets_indep)] |>
    na.exclude() |>
    as.data.frame() |>
    tibble::rownames_to_column("row") |>
    dplyr::mutate(gene = sub("_.*","",row),
                  outcome = sub(".*_","",row)) |>
    tidyr::pivot_longer(2:6, names_to = "IV_set", values_to = "pval")
  
  df_beta <- res_summary$min_p_beta[,names(variant_IVsets_indep)] |>
    na.exclude() |>
    as.data.frame() |>
    tibble::rownames_to_column("row") |>
    dplyr::mutate(gene = sub("_.*","",row),
                  outcome = sub(".*_","",row)) |>
    tidyr::pivot_longer(2:6, names_to = "IV_set", values_to = "beta")
  
  df <- left_join(df_p, df_beta) |>
    dplyr::mutate(call = ifelse(pval <= p_thresh & beta > 0, "sig_up",
                                ifelse(pval <= p_thresh & beta < 0, "sig_down", "ns")))
  
  df_grouped <- df |> 
    dplyr::group_by(IV_set,call) |>
    dplyr::summarise(n_genes = n()) |>
    dplyr::mutate(prop = n_genes/sum(n_genes) * 100,
                  outcome = unique(df$outcome),
                  n_genes_tot = sum(n_genes))
  
  return(df_grouped)
}

sig_IVs <- do.call("rbind", lapply(summary_sig, prop_sig_IVs, p_thresh = alpha))
sig_IVs_cis <- do.call("rbind", lapply(summary_sig_cis, prop_sig_IVs, p_thresh = alpha))
sig_IVs_trans <- do.call("rbind", lapply(summary_sig_trans, prop_sig_IVs, p_thresh = alpha))

sig_IVs_0.01 <- do.call("rbind", lapply(summary_sig, prop_sig_IVs, p_thresh = 0.01))
sig_IVs_0.01_cis <- do.call("rbind", lapply(summary_sig_cis, prop_sig_IVs, p_thresh = 0.01))
sig_IVs_0.01_trans <- do.call("rbind", lapply(summary_sig_trans, prop_sig_IVs, p_thresh = 0.01))

# Plot
p3 <- mapply(function(x,y){
  x |> dplyr::filter(call %in% c("sig_up","sig_down")) |>
    dplyr::mutate(label = factor(variant_IVsets_indep[IV_set], levels = variant_IVsets_indep)) |>
    ggplot(aes(x = label, y = prop, fill = call)) +
    facet_grid(.~outcome) +
    geom_bar(position = "stack", stat = "identity") +
    scale_fill_manual(name = "Direction of effect", values = c("steelblue","tomato"), label = c("down","up")) +
    theme_minimal() +
    theme(axis.text.x = element_text(angle = 45, hjust = 1),
          plot.title = element_text(hjust = 0.5),
          legend.position = "bottom",
          plot.margin = margin(t = 5, r = 5, b = 5, l = 30)) +
    labs(x = "IV set",
         y = "%", 
         title = paste0("Percentage of plasma protein causally associated with outcome using ", y, "IVs (at P<=0.01)"))
}, x = list(sig_IVs_0.01, sig_IVs_0.01_cis, sig_IVs_0.01_trans), y = c("all ","cis-","trans-"), SIMPLIFY = FALSE)

gridExtra::grid.arrange(grobs = p3)

p4 <- mapply(function(x,y){
  x |> dplyr::filter(call %in% c("sig_up","sig_down")) |>
    dplyr::mutate(label = factor(variant_IVsets_indep[IV_set], levels = variant_IVsets_indep)) |>
    ggplot(aes(x = label, y = prop, fill = call)) +
    facet_grid(.~outcome) +
    geom_bar(position = "stack", stat = "identity") +
    scale_fill_manual(name = "Direction of effect", values = c("steelblue","tomato"), label = c("down","up")) +
    theme_minimal() +
    theme(axis.text.x = element_text(angle = 45, hjust = 1),
          plot.title = element_text(hjust = 0.5),
          legend.position = "bottom",
          plot.margin = margin(t = 5, r = 5, b = 5, l = 30)) +
    labs(x = "IV set",
         y = "%", 
         title = paste0("Percentage of plasma protein causally associated with outcome using ", y, "IVs (at P<=1.7x10^-5)"))
}, x = list(sig_IVs, sig_IVs_cis, sig_IVs_trans), y = c("all ","cis-","trans-"), SIMPLIFY = FALSE)

gridExtra::grid.arrange(grobs = p4)
Figure 3
Figure 4

Comparison of IVW effect estimates across instrument sets

Code
# Functions
# Generate a table of IVW estimates for a pair of instrument sets
compare_estimates <- function(results_MR, IVset1, IVset2, n_snps=3, cis_trans="all"){
  # Pull out IVW estimates and SE for instrument sets
  if(cis_trans == "all"){
    df <- lapply(results_MR, function(x){
      x |> dplyr::filter(method == "Inverse variance weighted", 
                         cis_trans == "cis_trans",
                         IV_set %in% c(IVset1,IVset2))
    })
  }
  if(cis_trans == "cis"){
    df <- lapply(results_MR, function(x){
      x |> dplyr::filter(method == "Inverse variance weighted", 
                         cis_trans == "cis",
                         IV_set %in% c(IVset1,IVset2))
    })
  }
  if(cis_trans == "trans"){
    df <- lapply(results_MR, function(x){
      x |> dplyr::filter(method == "Inverse variance weighted", 
                         cis_trans == "trans",
                         IV_set %in% c(IVset1,IVset2))
    })
  }
  df_out_b <- do.call("rbind",df) |> 
    dplyr::filter(nsnp >= n_snps) |>
    tidyr::pivot_wider(names_from = "IV_set", values_from = c("b"), id_cols = "exposure") |>
    na.exclude() |>
    dplyr::relocate(exposure,IVset1,IVset2)
  
  df_out_se <- do.call("rbind",df) |> 
    dplyr::filter(nsnp >= n_snps) |>
    tidyr::pivot_wider(names_from = "IV_set", values_from = "se", id_cols = "exposure") |>
    na.exclude() |>
    dplyr::relocate(exposure,IVset1,IVset2)
  
  df_out_p <- do.call("rbind",df) |> 
    dplyr::filter(nsnp >= n_snps) |>
    tidyr::pivot_wider(names_from = "IV_set", values_from = "pval", id_cols = "exposure") |>
    na.exclude() |>
    dplyr::relocate(exposure,IVset1,IVset2)
  colnames(df_out_p)[2:3] <- paste0(colnames(df_out_p)[2:3],".pval")
  
  ## Caluclate effect heterogeneity
  if (!(identical(df_out_b$exposure, df_out_se$exposure) &&
        identical(colnames(df_out_b), colnames(df_out_se)))) {
    message("Error - mismatch of beta and SE dataframes")
    return(NULL)
  }
  
  beta_vecs <- lapply(split(df_out_b[,c(2,3)], 1:nrow(df_out_b)), as.numeric)
  se_vecs <- lapply(split(df_out_se[,c(2,3)], 1:nrow(df_out_se)), as.numeric)
  
  het_res <- do.call("rbind", mapply(effect_heterogeneity, beta_vec = beta_vecs, se_vec = se_vecs, SIMPLIFY = F)) |>
    as.data.frame() |>
    dplyr::mutate(exposure = df_out_b$exposure, .before = "Q") |>
    dplyr::mutate(Qpval_bon = p.adjust(Qpval, method = "bonferroni"),
                  Qpval_fdr = p.adjust(Qpval, method = "fdr"))
  
  df_out <- dplyr::left_join(df_out_b, df_out_se, by = "exposure", suffix = c(".beta",".se")) |>
    dplyr::left_join(df_out_p, by = "exposure") |>
    dplyr::left_join(het_res, by = "exposure")
  
  return(df_out)
}

# Generate a table of cis vs trans IVW estimates for given instrument set
compare_cistrans <- function(results_MR, IVset, n_snps=3){
  # Pull out IVW estimates and SE for instrument sets
  df <- lapply(results_MR, function(x){
    x |> 
      dplyr::filter(method == "Inverse variance weighted",
                    IV_set %in% IVset,
                    nsnp >= n_snps,
                    cis_trans %in% c("cis","trans")) |>
      dplyr::mutate(IV_set = paste(IV_set, cis_trans, sep = "_")) |>
      dplyr::arrange(cis_trans)})
  df <- df[lapply(df, nrow) == 2]
  
  df_out_b <- do.call("rbind",df) |> 
    tidyr::pivot_wider(names_from = "IV_set", values_from = "b", id_cols = "exposure") |>
    na.exclude()
  
  df_out_se <- do.call("rbind",df) |> 
    tidyr::pivot_wider(names_from = "IV_set", values_from = "se", id_cols = "exposure") |>
    na.exclude()
  
  df_out_p <- do.call("rbind",df) |> 
    tidyr::pivot_wider(names_from = "IV_set", values_from = "pval", id_cols = "exposure") |>
    na.exclude()
  colnames(df_out_p)[2:3] <- paste0(colnames(df_out_p)[2:3],".pval")
  
  ## Caluclate effect heterogeneity
  if (!(identical(df_out_b$exposure, df_out_se$exposure) &&
        identical(colnames(df_out_b), colnames(df_out_se)))) {
    message("Error - mismatch of beta and SE dataframes")
    return(NULL)
  }
  
  beta_vecs <- lapply(split(df_out_b[,c(2,3)], 1:nrow(df_out_b)), as.numeric)
  se_vecs <- lapply(split(df_out_se[,c(2,3)], 1:nrow(df_out_se)), as.numeric)
  
  het_res <- do.call("rbind", mapply(effect_heterogeneity, beta_vec = beta_vecs, se_vec = se_vecs, SIMPLIFY = F)) |>
    as.data.frame() |>
    dplyr::mutate(exposure = df_out_b$exposure, .before = "Q") |>
    dplyr::mutate(Qpval_bon = p.adjust(Qpval, method = "bonferroni"),
                  Qpval_fdr = p.adjust(Qpval, method = "fdr"))
  
  df_out <- dplyr::left_join(df_out_b, df_out_se, by = "exposure", suffix = c(".beta",".se")) |>
    dplyr::left_join(df_out_p, by = "exposure") |>
    dplyr::left_join(het_res, by = "exposure")
  
  return(df_out)
}

plot_estimates <- function(estimates, name){
  estimates <- as.data.frame(estimates)
  col_names <- names(estimates)
  
  if(any(grepl("cis|trans",col_names))){
    lab_x <- "cis-IVs"
    lab_y <- "trans-IVs"
  } else {
    lab_x <- IVsets[sub("\\.beta","",col_names[2])]
    lab_y <- IVsets[sub("\\.beta","",col_names[3])]
  }
  
  CI_x <- data.frame(LCI = estimates[,2] - 1.96*estimates[,4], UCI = estimates[,2] + 1.96*estimates[,4])
  CI_y <- data.frame(LCI = estimates[,3] - 1.96*estimates[,5], UCI = estimates[,3] + 1.96*estimates[,5])
  
  mod_lm <- lm(estimates[,3] ~ estimates[,2])
  slope <- round(coefficients(mod_lm)[2],2)
  
  flag_point <- estimates$Qpval_bon <= 0.05
  
  range_min <- min(min(estimates[,2]), min(estimates[,3])) #min(na.exclude(errorbar_xmin, errorbar_ymin)))
  range_max <- max(max(estimates[,2]), max(estimates[,3])) #min(na.exclude(errorbar_xmax, errorbar_ymax)))
  
  estimates_plot <- estimates |>
    dplyr::mutate(exposure = factor(exposure, levels = estimates$exposure[order(estimates$Qpval, decreasing = TRUE)]),
                  errorbar_xmin = ifelse(flag_point, CI_x$LCI, NA),
                  errorbar_xmax = ifelse(flag_point, CI_x$UCI, NA),
                  errorbar_ymin = ifelse(flag_point, CI_y$LCI, NA),
                  errorbar_ymax = ifelse(flag_point, CI_y$UCI, NA)) |>
    dplyr::arrange(exposure)
  
  estimates_plot |>
    ggplot(aes_string(x = col_names[2], y = col_names[3])) +
    geom_abline(slope = 1, intercept = 0, colour = "steelblue", lty = 2, lwd = 0.5) +
    geom_point(aes(colour = estimates_plot$Qpval_bon <= 0.05), size = 2, shape = 16) +  
    geom_errorbar(xmin = estimates_plot$errorbar_xmin,
                  xmax = estimates_plot$errorbar_xmax, colour = "tomato") +
    geom_errorbar(ymin = estimates_plot$errorbar_ymin,
                  ymax = estimates_plot$errorbar_ymax, colour = "tomato") +
    geom_smooth(method = "lm", se = FALSE, colour = "tomato", lwd = 0.5) +
    annotate(geom = "text", x = range_min/2, y = range_max, label = paste0("Slope=",slope), size = 3, colour = "tomato") +
    scale_colour_manual(values = c("black","tomato")) +
    scale_x_continuous(limits = c(range_min, range_max)) +
    scale_y_continuous(limits = c(range_min, range_max)) +
    theme_minimal() +
    theme(plot.title = element_text(hjust = 0.5, size = 10), legend.position = "none") +
    labs(x=lab_x, y=lab_y, title = name)
}

# Upset plot comparing signifiant causal estimates across variant sets
# Restrtcited to those proteins that could be instrumented with >= 3 variants across the comparison sets

plot_upset <- function(IVsets_list, cis_trans = "all", pval = 0.01){
  
  if(cis_trans == "all"){inlist = summary_sig}
  if(cis_trans == "cis"){inlist = summary_sig_cis}
  if(cis_trans == "trans"){inlist = summary_sig_trans}
  
  set_sig <- lapply(inlist, function(x){
    dat <- x$min_p[,IVsets_list]
    # Extract only those tests that could be conducted in all IV sets
    tests <- rownames(dat) # Exposure_outcome
    shared_tests <- tests[apply(dat, MARGIN = 1, function(x){sum(!(is.na(x)))}) == 2]
    dat <- dat[shared_tests,]
    dat_sig <- dat <= pval
    
    tests <- rownames(dat_sig)
    list_sig <- apply(dat_sig, MARGIN = 2, FUN = function(x){tests[x]}, simplify = F) # Names of significant exposures
    names(list_sig) <- IVsets[names(list_sig)]
    list_sig <- list_sig[lapply(list_sig, length) > 0] # Remove empty exposure sets
    return(list_sig)
  })
  
  # Upset plots
  plots <- mapply(FUN = function(list_sig, outcome){
    if(length(list_sig) == 1){return(NULL)}
    upset(fromList(list_sig), order.by = "freq", text.scale = 1.5, mainbar.y.label = paste0("Protein:",outcome,"\n (IVW p<", pval, ")"))
  }, list_sig = set_sig, outcome = names(set_sig), SIMPLIFY = F)
  
  plots <- plots[!sapply(plots, is.null)]
  
  # for (v in names(plots)) {
  #   print(plots[[v]])
  #   #grid.text(v, x = 0.65, y=0.97, gp = gpar(fontsize = 20))
  #   grid.edit('arrange', name = v)
  #   vp <- grid.grab()
  #   plots[[v]] <- vp
  # }
  return(plots)
}

1a) Common variant cis-IVs (finemapped vs clumped)
1b) Common variant trans-IVs (finemapped vs clumped)

2a) Common variant cis-IVs (GWAS vs ExWAS)
2b) Common variant trans-IVs (GWAS vs ExWAS)

Code
comp_IVWs_common_1a <- lapply(res_MR, FUN = compare_estimates, 
                              IVset1 = "sun_gwas_variant_common_clumped",
                              IVset2 = "sun_gwas_variant_common_finemapped",
                              n_snps = 3, cis_trans = "cis")

gridExtra::grid.arrange(grobs = mapply(plot_estimates, estimates = comp_IVWs_common_1a, name = names(comp_IVWs_common_1a), SIMPLIFY = FALSE), nrow = 2,
                        top = "Comparison of IVW estimates from common cis-pQTL instruments (finemapped vs clumped)")

comp_IVWs_common_1b <- lapply(res_MR, FUN = compare_estimates, 
                              IVset1 = "sun_gwas_variant_common_clumped",
                              IVset2 = "sun_gwas_variant_common_finemapped",
                              n_snps = 3, cis_trans = "trans")

gridExtra::grid.arrange(grobs = mapply(plot_estimates, estimates = comp_IVWs_common_1b, name = names(comp_IVWs_common_1b), SIMPLIFY = FALSE), nrow = 2,
                        top = "Comparison of IVW estimates from common trans-pQTL instruments (finemapped vs clumped)")

comp_IVWs_common_2a <- lapply(res_MR, FUN = compare_estimates, 
                              IVset1 = "dhindsa_exwas_variant_common",
                              IVset2 = "sun_gwas_variant_common_clumped",
                              n_snps = 3, cis_trans = "cis")

gridExtra::grid.arrange(grobs = mapply(plot_estimates, estimates = comp_IVWs_common_2a, name = names(comp_IVWs_common_2a), SIMPLIFY = FALSE), nrow = 2,
                        top = "Comparison of IVW estimates from GWAS common and WES common cis-pQTL instrumentation strategies")

comp_IVWs_common_2b <- lapply(res_MR, FUN = compare_estimates, 
                              IVset1 = "dhindsa_exwas_variant_common",
                              IVset2 = "sun_gwas_variant_common_clumped",
                              n_snps = 3, cis_trans = "trans")

gridExtra::grid.arrange(grobs = mapply(plot_estimates, estimates = comp_IVWs_common_2b, name = names(comp_IVWs_common_2b), SIMPLIFY = FALSE), nrow = 2,
                        top = "Comparison of IVW estimates from GWAS common and WES common trans-pQTL instrumentation strategies")
Figure 5
Figure 6
Figure 7
Figure 8

3a) Common vs rare variant cis-IVs (ExWAS)
3b) Common vs rare variant trans-IVs (ExWAS)

4a) Common vs ultra-rare variant cis-IVs (ExWAS)
4b) Common vs ultra-rare variant trans-IVs (ExWAS)

Code
comp_IVWs_3a <- lapply(res_MR, FUN = compare_estimates, 
                       IVset1 = "dhindsa_exwas_variant_common",
                       IVset2 = "dhindsa_exwas_variant_rare",
                       n_snps = 3,
                       cis_trans = "cis")

gridExtra::grid.arrange(grobs = mapply(plot_estimates, estimates = comp_IVWs_3a, name = names(comp_IVWs_3a), SIMPLIFY = FALSE), nrow = 2,
                        top = "Comparison of IVW estimates from WES common and rare cis-pQTL instrumentation strategies")

comp_IVWs_3b <- lapply(res_MR, FUN = compare_estimates, 
                       IVset1 = "dhindsa_exwas_variant_common",
                       IVset2 = "dhindsa_exwas_variant_rare",
                       n_snps = 3,
                       cis_trans = "trans")

gridExtra::grid.arrange(grobs = mapply(plot_estimates, estimates = comp_IVWs_3b, name = names(comp_IVWs_3b), SIMPLIFY = FALSE), nrow = 2,
                        top = "Comparison of IVW estimates from WES common and rare trans-pQTL instrumentation strategies")

comp_IVWs_4a <- lapply(res_MR, FUN = compare_estimates, 
                       IVset1 = "dhindsa_exwas_variant_common",
                       IVset2 = "dhindsa_exwas_variant_ultrarare",
                       n_snps = 3,
                       cis_trans = "cis")

gridExtra::grid.arrange(grobs = mapply(plot_estimates, estimates = comp_IVWs_4a, name = names(comp_IVWs_4a), SIMPLIFY = FALSE), nrow = 2,
                        top = "Comparison of IVW estimates from WES common and ultra-rare cis-pQTL instrumentation strategies")

comp_IVWs_4b <- lapply(res_MR, FUN = compare_estimates, 
                       IVset1 = "dhindsa_exwas_variant_common",
                       IVset2 = "dhindsa_exwas_variant_ultrarare",
                       n_snps = 3,
                       cis_trans = "trans")

gridExtra::grid.arrange(grobs = mapply(plot_estimates, estimates = comp_IVWs_4b, name = names(comp_IVWs_4b), SIMPLIFY = FALSE), nrow = 2,
                        top = "Comparison of IVW estimates from WES common and ultra-rare trans-pQTL instrumentation strategies")
Figure 9
Figure 10
Figure 11
Figure 12

5a) Common (GWAS) vs rare variant cis-IVs (ExWAS)
5b) Common (GWAS) vs rare variant trans-IVs (ExWAS)

6a) Common (GWAS) vs ultra-rare variant cis-IVs (ExWAS)
6b) Common (GWAS) vs ultra-rare variant trans-IVs (ExWAS)

Code
comp_IVWs_5a <- lapply(res_MR, FUN = compare_estimates, 
                       IVset1 = "sun_gwas_variant_common_clumped",
                       IVset2 = "dhindsa_exwas_variant_rare",
                       n_snps = 3,
                       cis_trans = "cis")

gridExtra::grid.arrange(grobs = mapply(plot_estimates, estimates = comp_IVWs_5a, name = names(comp_IVWs_5a), SIMPLIFY = FALSE), nrow = 2,
                        top = "Comparison of IVW estimates from GWAS common and WES rare cis-pQTL instrumentation strategies")

comp_IVWs_5b <- lapply(res_MR, FUN = compare_estimates, 
                       IVset1 = "sun_gwas_variant_common_clumped",
                       IVset2 = "dhindsa_exwas_variant_rare",
                       n_snps = 3,
                       cis_trans = "trans")

gridExtra::grid.arrange(grobs = mapply(plot_estimates, estimates = comp_IVWs_5b, name = names(comp_IVWs_5b), SIMPLIFY = FALSE), nrow = 2,
                        top = "Comparison of IVW estimates from GWAS  common and WES rare trans-pQTL instrumentation strategies")

comp_IVWs_6a <- lapply(res_MR, FUN = compare_estimates, 
                       IVset1 = "sun_gwas_variant_common_clumped",
                       IVset2 = "dhindsa_exwas_variant_ultrarare",
                       n_snps = 3,
                       cis_trans = "cis")

gridExtra::grid.arrange(grobs = mapply(plot_estimates, estimates = comp_IVWs_6a, name = names(comp_IVWs_6a), SIMPLIFY = FALSE), nrow = 2,
                        top = "Comparison of IVW estimates from GWAS common and WES ultra-rare cis-pQTL instrumentation strategies")

comp_IVWs_6b <- lapply(res_MR, FUN = compare_estimates, 
                       IVset1 = "sun_gwas_variant_common_clumped",
                       IVset2 = "dhindsa_exwas_variant_ultrarare",
                       n_snps = 3,
                       cis_trans = "trans")

gridExtra::grid.arrange(grobs = mapply(plot_estimates, estimates = comp_IVWs_6b, name = names(comp_IVWs_6b), SIMPLIFY = FALSE), nrow = 2,
                        top = "Comparison of IVW estimates from GWAS common and WES ultra-rare trans-pQTL instrumentation strategies")

# Extract list of genes with significant heterogenity between cis common and rare/ultrarare variant estimates
het_genes <- lapply(
  Map(c, lapply(comp_IVWs_3a, function(x){x |> dplyr::filter(Qpval_bon <= 0.1) |> dplyr::pull(exposure)}),
      lapply(comp_IVWs_4a, function(x){x |> dplyr::filter(Qpval_bon <= 0.1) |> dplyr::pull(exposure)}),
      lapply(comp_IVWs_5a, function(x){x |> dplyr::filter(Qpval_bon <= 0.1) |> dplyr::pull(exposure)}),
      lapply(comp_IVWs_6a, function(x){x |> dplyr::filter(Qpval_bon <= 0.1) |> dplyr::pull(exposure)})), function(x){unique(x)})

het_genes <- mapply(FUN = function(x,y){paste(x,y,sep = "_")}, x = het_genes, y = names(het_genes), SIMPLIFY = F)
Figure 13
Figure 14
Figure 15
Figure 16

Overlap between protein:outcome pairs with significant causal effectes estimated across cis-IV sets

Code
plot_upset(IVsets_list = c("sun_gwas_variant_common_clumped","dhindsa_exwas_variant_rare","dhindsa_exwas_variant_ultrarare"), cis_trans = "cis", pval = 0.05)
$BMI


$CorpVol


$HbA1c


$IGF1


$LDL


$PlatVol


$RBCcount


$SBP


$Trig


$VitD


$WHRadjBMI

  1. Common cis-IVs vs trans-IVs (GWAS)
  2. Common cis-IVs vs trans-IVs (ExWAS)
  3. Rare cis-IVs vs trans-IVs (ExWAS)
  4. Ultra-rare cis-IVs vs trans-IVs
Code
comp_IVWs_7 <- lapply(res_MR, FUN = compare_cistrans, 
                      IVset = "sun_gwas_variant_common_clumped",
                      n_snps = 3)

gridExtra::grid.arrange(grobs = mapply(plot_estimates, estimates = comp_IVWs_7, name = names(comp_IVWs_7), SIMPLIFY = FALSE), nrow = 2,
                        top = "Comparison of IVW estimates from GWAS common and cis- and trans-pQTL instrumentation strategies")

comp_IVWs_8 <- lapply(res_MR, FUN = compare_cistrans, 
                      IVset = "dhindsa_exwas_variant_common",
                      n_snps = 3)

gridExtra::grid.arrange(grobs = mapply(plot_estimates, estimates = comp_IVWs_8, name = names(comp_IVWs_8), SIMPLIFY = FALSE), nrow = 2,
                        top = "Comparison of IVW estimates from ExWAS common and cis- and trans-pQTL instrumentation strategies")

comp_IVWs_9 <- lapply(res_MR, FUN = compare_cistrans, 
                      IVset = "dhindsa_exwas_variant_rare",
                      n_snps = 3)

gridExtra::grid.arrange(grobs = mapply(plot_estimates, estimates = comp_IVWs_9, name = names(comp_IVWs_9), SIMPLIFY = FALSE), nrow = 2,
                        top = "Comparison of IVW estimates from ExWAS rare and cis- and trans-pQTL instrumentation strategies")

comp_IVWs_10 <- lapply(res_MR, FUN = compare_cistrans, 
                       IVset = "dhindsa_exwas_variant_ultrarare",
                       n_snps = 3)

gridExtra::grid.arrange(grobs = mapply(plot_estimates, estimates = comp_IVWs_10, name = names(comp_IVWs_10), SIMPLIFY = FALSE), nrow = 2,
                        top = "Comparison of IVW estimates from ExWAS ultra-rare and cis- and trans-pQTL instrumentation strategies")

# Extract list of genes with significant heterogenity between cis and trans estimates
het_genes_cistrans <- lapply(
  Map(c, lapply(comp_IVWs_7, function(x){x |> dplyr::filter(Qpval_bon <= 0.1) |> dplyr::pull(exposure)}),
      lapply(comp_IVWs_8, function(x){x |> dplyr::filter(Qpval_bon <= 0.1) |> dplyr::pull(exposure)}),
      lapply(comp_IVWs_9, function(x){x |> dplyr::filter(Qpval_bon <= 0.1) |> dplyr::pull(exposure)}),
      lapply(comp_IVWs_10, function(x){x |> dplyr::filter(Qpval_bon <= 0.1) |> dplyr::pull(exposure)})), function(x){unique(x)})

het_genes_cistrans <- mapply(FUN = function(x,y){paste(x,y,sep = "_")}, x = het_genes_cistrans , y = names(het_genes_cistrans), SIMPLIFY = F)
Figure 17
Figure 18
Figure 19
Figure 20

Heterogenity in pQTL effects

Restricting to those protein exposures that show signficiant causal effects on a given outcome (at p <= 0.05 with >= 3 IVs), and significant heterogenity between rare and common cis variant IV sets (at Qstat P(FDR) < 0.05):

Code
# Retain exposure-outcome pairs with significant IVW estimate for at least one instrumentation strategy (with n instruments >= n_snps)
res_MR_top <- lapply(res_MR, filter_results, p_thresh = 0.05, n_snps = 3)

# Filter to those with evidence of estimate heterogenity between variant sets
res_MR_top_het <- mapply(function(x,y){x[names(x) %in% y]}, x = res_MR_top, y = het_genes, SIMPLIFY = F)

# Filter to those with evidence of estimate heterogeneity between cis and trans IVs
res_MR_top_cistrans_het <- mapply(function(x,y){x[names(x) %in% y]}, x = res_MR_top, y = het_genes_cistrans, SIMPLIFY = F)

Heterogeneity in rare v common cis IV causal estimates

Code
# Plot estimates
res_IVW_cis <- lapply(res_MR_top_het, function(x){
  df <- do.call("rbind", x) |> 
    filter(method == "Inverse variance weighted",
           grepl("variant",IV_set),
           cis_trans == "cis") |>
    dplyr::mutate(instrument_class = factor(variant_IVsets[IV_set], levels = rev(variant_IVsets)))
  return(df)})

plot_forest <- function(results_IVW){
  pd <- ggplot2::position_dodge2(width = 0.8, preserve = "single")
  exposure <- unique(sub(".*_", "", results_IVW$pair))
  
  n_row <- ceiling(length(unique(results_IVW$exposure))/4)
  
  p <- ggplot(results_IVW, aes(x = b, y = instrument_class, colour = instrument_class)) +
    facet_wrap(.~pair, ncol = 5, scale = "free_x") +
    geom_vline(xintercept = 0, colour = "grey20", lty = 2) +
    geom_point(aes(size = nsnp), position = pd) +
    geom_errorbarh(aes(xmin = b-1.96*se, xmax = b+1.96*se, group = cis_trans), position = pd) +
    scale_colour_manual(values = cols_variant_IVsets) +
    theme_bw() + guides(colour = "none") +
    theme(plot.title = element_text(hjust = 0.5)) +
    labs(x = "IVW", y = "", size = "n (instruments)", title = paste("Outcome:", exposure))
  
  return(p)
}

lapply(res_IVW_cis, plot_forest)[c(1,10)]
Figure 21
Figure 22
Code
lapply(res_IVW_cis, plot_forest)[c(2,3,6,8,9,11)]
Figure 23
Figure 24
Figure 25
Figure 26
Figure 27
Figure 28
Code
lapply(res_IVW_cis, plot_forest)[c(4)]
Figure 29
Code
lapply(res_IVW_cis, plot_forest)[c(5)]
Figure 30
Code
lapply(res_IVW_cis, plot_forest)[c(7)]
Figure 31

Heterogeneity in cis v trans IV causal estimates

Code
# Plot estimates
res_IVW_cistrans <- lapply(res_MR_top_cistrans_het, function(x){
  df <- do.call("rbind", x) |> 
    filter(method == "Inverse variance weighted",
           grepl("variant",IV_set),
           cis_trans %in% c("cis","trans")) |>
    dplyr::mutate(instrument_class = factor(variant_IVsets[IV_set], levels = rev(variant_IVsets)))
  return(df)})

plot_forest_cistrans <- function(results_IVW){
  pd <- ggplot2::position_dodge2(width = 0.8, preserve = "single")
  exposure <- unique(sub(".*_", "", results_IVW$pair))
  
  n_row <- ceiling(length(unique(results_IVW$exposure))/4)
  
  p <- ggplot(results_IVW, aes(x = b, y = instrument_class, colour = instrument_class)) +
    facet_wrap(.~pair, ncol = 6, scale = "free_x") +
    geom_vline(xintercept = 0, colour = "grey20", lty = 2) +
    geom_point(aes(pch = cis_trans, size = nsnp), position = pd) +
    geom_errorbarh(aes(xmin = b-1.96*se, xmax = b+1.96*se, group = cis_trans), position = pd) +
    scale_colour_manual(values = cols_variant_IVsets) +
    scale_shape_manual(name = "", values = c(16,1)) +
    theme_bw() + guides(colour = "none") +
    theme(plot.title = element_text(hjust = 0.5)) +
    labs(x = "IVW", y = "", size = "n (instruments)", title = paste("Outcome:", exposure))
  
  return(p)
}

lapply(res_IVW_cistrans, plot_forest_cistrans)[c(1,6,9,11)]
Figure 32
Figure 33
Figure 34
Figure 35
Code
lapply(res_IVW_cistrans, plot_forest_cistrans)[c(2,3,5,7)]
Figure 36
Figure 37
Figure 38
Figure 39
Code
lapply(res_IVW_cistrans, plot_forest_cistrans)[c(4)]
Figure 40
Code
lapply(res_IVW_cistrans, plot_forest_cistrans)[c(8)]
Figure 41
Code
lapply(res_IVW_cistrans, plot_forest_cistrans)[c(10)]
Figure 42

Concordance vs discordance in cis and trans effects

How often do causal estimates for a protein:outcome effect derived from cis- and trans-pQTLs agree (in terms of significance and direction of effect)? [TO ADD]

Consistency in pQTL effects

Restricting to those protein exposures that show signficiant causal effects on a given outcome (at p <= 0.05 with >= 3 IVs) supported by rare and common cis variant IV sets (Qsat P(FDR) < 0.05):

Code
# Extract list of genes with consistent (significant) effects between cis common and rare/ultrarare variant estimates
consistant_genes <- lapply(
  Map(c, 
      lapply(comp_IVWs_3a, function(x){x |> dplyr::filter(Qpval_bon > 0.1 & x[,6] < 0.05 & x[,7] < 0.05 & sign(x[,2]) == sign(x[,3])) |> dplyr::pull(exposure)}),
      lapply(comp_IVWs_4a, function(x){x |> dplyr::filter(Qpval_bon > 0.1 & x[,6] < 0.05 & x[,7] < 0.05 & sign(x[,2]) == sign(x[,3])) |> dplyr::pull(exposure)}),
      lapply(comp_IVWs_5a, function(x){x |> dplyr::filter(Qpval_bon > 0.1 & x[,6] < 0.05 & x[,7] < 0.05 & sign(x[,2]) == sign(x[,3])) |> dplyr::pull(exposure)}),
      lapply(comp_IVWs_6a, function(x){x |> dplyr::filter(Qpval_bon > 0.1 & x[,6] < 0.05 & x[,7] < 0.05 & sign(x[,2]) == sign(x[,3])) |> dplyr::pull(exposure)})), function(x){unique(x)})

consistant_genes <- mapply(FUN = function(x,y){paste(x,y,sep = "_")}, x = consistant_genes, y = names(consistant_genes), SIMPLIFY = F)

# Filter to significant MR results supported across IV sets
res_MR_top_consistant <- mapply(function(x,y){x[names(x) %in% y]}, x = res_MR_top, y = consistant_genes, SIMPLIFY = F)

res_IVW_consistant <- lapply(res_MR_top_consistant, function(x){
  df <- do.call("rbind", x) |> 
    filter(method == "Inverse variance weighted",
           grepl("variant",IV_set),
           cis_trans == "cis") |>
    dplyr::mutate(instrument_class = factor(variant_IVsets[IV_set], levels = rev(variant_IVsets)))
  return(df)})
Code
lapply(res_IVW_consistant, plot_forest)[c(1,3,5,6,9,10)]
Figure 43
Figure 44
Figure 45
Figure 46
Figure 47
Figure 48
Code
lapply(res_IVW_consistant, plot_forest)[c(4,7)]
Figure 49
Figure 50
Code
lapply(res_IVW_consistant, plot_forest)[c(2,11)]
Figure 51
Figure 52
Code
lapply(res_IVW_consistant, plot_forest)[c(8)]
Figure 53

Allelic series

Assessing for monotonically increasing causal effect estimates (Wald Ratios) across rare-variant series of alleles:
- Rare variants (<1% MAF): benign missense variants (BMV) - damaging missense variants (DMVs) - protein truncating variants (PTVs)

Using the COAST-SS approach (developed for allelic series association tests), which has been developed to handle GWAS summary statistics, but here we are passing Wald Ratios (variant outcome effect/ variant protein effect)

What is needed here?:
- Summary statistics for all rare variants (capped at p<0.01 by the AzPheWAS portal for pQTLs) falling in the coding region of genes encoding the O’link plasma proteome assayed proteins
- The corresponding variant associations for outcomes of interest (with which to calculate Wald ratios)  - Variant predicted functional effects (with which to categorise SNPs)

Magnitude of variant class effects

Do cis-variants predicted to be more deleterious in nature (loss of function) actually have larger effects on protein expression?

Code
# Read in variant effect estimates, wald ratios and VEP annotations
load(file.path(res_dir,"results_molecular_waldratios_ciscodingvariants_VEPannotated.rda"))

wald_ratios <- lapply(wald_ratios, function(x){
  x$annots_relaxed <- as.factor(x$annots_relaxed)
  x})

vep_class <- c("1" = "syn./noncoding", 
               "2" = "benign missense",
               "3" = "del. missense",
               "4" = "putative LoF")

cols_vep <- c("1" = "steelblue", 
              "2" = "olivedrab3",
              "3" = "orange",
              "4" = "tomato")

# Plot variant consequnce call on cis pQTL effect
wald_ratios$LDL |> 
  dplyr::filter(!is.na(annots_relaxed)) |> 
  ggplot(aes(x = beta.exposure, 
             y = annots_relaxed)) +
  ggridges::geom_density_ridges(aes(fill = annots_relaxed), trim = TRUE) +
  geom_vline(xintercept = 0, colour ="grey30", linewidth = 1) + 
  scale_y_discrete(labels = vep_class) +
  scale_fill_manual(values = cols_vep) +
  theme_minimal() +
  theme(legend.position = "none") +
  labs(x = "cis-pQTL effect", y = "Variant annotation")

Code
# Plot wald ratios split by variant consequence
plot_wald_vep <- function(gene, outcome){
  dat <- wald_ratios[[outcome]] |> 
    dplyr::filter(exposure == gene) |>
    dplyr::arrange(annots_relaxed, wald_ratio_b) |>
    dplyr::mutate(SNP = factor(SNP, levels = SNP))
  
  res <- coastss_annots_relaxed[[outcome]] |>
    dplyr::filter(exposure == gene)
  
  p_theme <- theme_bw() +
    theme(plot.title = element_text(hjust = 0.5),
          plot.subtitle = element_text(hjust = 0.5),
          legend.position = "bottom",
          legend.key.height = unit(0.2, 'cm'))
  
  # Exposure
  p_exp <- ggplot(dat, aes(x = beta.exposure, y = SNP, colour = annots_relaxed)) +
    geom_vline(xintercept = 0, colour = "grey20", lty = 2) +
    geom_point() +
    scale_colour_manual(values = cols_vep, labels = vep_class, name = "") +
    geom_errorbarh(aes(xmin = beta.exposure-1.96*se.exposure, xmax = beta.exposure+1.96*se.exposure)) +
    labs(x = "Bx", y = "", 
         title = paste("Exposure:", gene),
         subtitle = "") +
    guides(colour = guide_legend(nrow=2, byrow=TRUE)) +
    p_theme
  
  # Outcome
  p_out <- ggplot(dat, aes(x = beta.outcome, y = SNP, colour = annots_relaxed)) +
    geom_vline(xintercept = 0, colour = "grey20", lty = 2) +
    geom_point() +
    scale_colour_manual(values = cols_vep, labels = vep_class, name = "") +
    geom_errorbarh(aes(xmin = beta.outcome-1.96*se.outcome, xmax = beta.outcome+1.96*se.outcome)) +
    labs(x = "By", y = "", 
         title = paste("Outcome:", outcome),
         subtitle = "") +
    guides(colour = guide_legend(nrow=2, byrow=TRUE)) +
    p_theme
  
  # Wald Ratio
  p_wald <- ggplot(dat, aes(x = wald_ratio_b, y = SNP, colour = annots_relaxed)) +
    geom_vline(xintercept = 0, colour = "grey20", lty = 2) +
    geom_point() +
    scale_colour_manual(values = cols_vep, labels = vep_class, name = "") +
    geom_errorbarh(aes(xmin = wald_ratio_b-1.96*wald_ratio_se, xmax = wald_ratio_b+1.96*wald_ratio_se)) +
    labs(x = "Wald Ratio (By/Bx)", y = "", 
         title = paste("Exposure:", gene, "Outcome:", outcome),
         subtitle = paste("COAST-SS p-value:", signif(res$alleleic_skat_p, 2))) +
    guides(colour = guide_legend(nrow=2, byrow=TRUE)) +
    p_theme
  
  return(gridExtra::grid.arrange(grobs = list(p_wald, p_exp, p_out), nrow = 1))
}

Allelic series analysis

Code
# Load results from COAST-SS allelic series anlysis 
# 1 = synonymous/noncoding, 2 = benign missense, 3 = deleterious missense, 4 = putative loss of funcion

load(file.path(res_dir,"results_molecular_COASTSS_relaxed.rda")) # coastss_annots_relaxed

## Filter out exposure genes for which there were insufficient variants for alleleic series analysis and calculate adjusted p-vals
coastss_annots_relaxed <- lapply(coastss_annots_relaxed, function(x){
  x |> 
    dplyr::filter(!is.na(alleleic_skat_p)) |>
    dplyr::mutate(adj_p = p.adjust(alleleic_skat_p, method = "BH"))
})

outcome_names <- c("LDL direct" = "LDL",
                    "Body mass index (BMI)" = "BMI",
                    "Vitamin D" = "Vitamin D",
                    "Triglycerides" = "Triglycerides",
                    "Glycated haemoglobin (HbA1c)" = "HbA1c",
                    "Mean platelet (thrombocyte) volume" = "PlatVol",
                    "IGF-1" = "IGF-1",
                    "Waist-to-hip ratio adjusted for body mass index (WHRadjBMI)" = "WHRadjBMI",
                    "Red blood cell (erythrocyte) count" = "RBCcount",
                    "Mean corpuscular volume" = "CorpVol",
                    "Systolic blood pressure, automated reading" = "SBP")
                    

do.call("rbind", coastss_annots_relaxed) |>
  dplyr::mutate(label = ifelse(adj_p < 0.05, exposure, NA),
                outcome = outcome_names[outcome]) |>
  ggplot(aes(x = exposure, y = -log10(alleleic_skat_p))) +
  facet_grid(outcome ~ ., scale = "free_y") +
  geom_point(aes(colour = adj_p < 0.05)) +
  scale_colour_manual(name = "P(FDR) < 0.05", values = c("grey","tomato")) +
  ggrepel::geom_text_repel(aes(label = label), size = 4, max.overlaps = Inf, force = 10) +
  theme_minimal() +
  theme(strip.text = element_text(size = 10),
        panel.grid = element_blank(),
        axis.text.x = element_blank(),
        axis.title = element_text(size = 14),
        legend.position = "bottom") +
  labs(x = "Protein exposure", y = "-log10(P) Allelic series")

Dopamine Beta Hydrozylase (DBH) on systolic blood pressure

Code
plot_wald_vep(gene = "DBH", outcome = "SBP")

Gene based analysis

In gene based analysis, variant aggregate approaches are used to test for the effect of carrying any single or combination of rare genetic variants in a gene on a phenotype. Variant masks are often generated based on variant properties (e.g putative loss of function or missense variants), allowing the consequence of varying degrees of gene perturbation to be assessed. Here gene-based masks have been testing for cis and trans associations with plasma protein levels and complex traits, allowing them to be used in an MR framework. It is perhaps most straight forward to handle gene-level associations as single instuments and interpret Wald ratios, although in some case multiple gene-level instruments can exist for a given exposure.

Code
# Extract gene-based burden mask and calculate Wald ratios
res_masks <- lapply(harmonised_snps, function(x){
  masks <- lapply(x, function(y){
    dat <- y[IV_sets[1:4]] # gene mask results
    # Remove empty results dfs
    dat <- dat[unlist(lapply(dat, function(x){all(!(is.na(x$SNP)))}))]
    
    dat <- do.call("rbind", lapply(dat, function(z){
      z |> dplyr::select(
        mask = test.exposure, 
        mask_gene = SNP, 
        chr = chr.outcome, 
        pos = pos.outcome, 
        exposure, outcome, 
        beta.exposure, beta.outcome, 
        se.exposure, se.outcome, 
        pval.exposure, pval.outcome, 
        cis_trans)
    }))
  })
})

res_masks <- lapply(res_masks, function(x){
  out <- do.call("rbind",x)
  rownames(out) <- NULL
  return(out)})

# Wald ratios
res_masks <- lapply(res_masks, function(x){
  wrs <- apply(x, MARGIN = 1, FUN = function(x){
    TwoSampleMR::mr_wald_ratio(
    b_exp = as.numeric(x["beta.exposure"]),
    b_out = as.numeric(x["beta.outcome"]),
    se_exp = as.numeric(x["se.exposure"]),
    se_out = as.numeric(x["se.outcome"]))
  })

  wrs_df <- do.call("rbind",lapply(wrs, unlist))
  colnames(wrs_df) <- paste("wald_ratio", colnames(wrs_df), sep = "_")

  # Bind to harmonised data
  out <- cbind(x,wrs_df)
  return(out)
})

Gene-level cis-mask associations

Focusing on aggregate tests combining putative loss of function (protein truncating) and damaging rare varaints, which genes show significant causal effects on outcomes of interest:

Code
res_masks_cis <- lapply(res_masks, function(x){
  out <- x |> 
    dplyr::filter(cis_trans == "cis",
                            mask == "ptvraredmg") |> 
    dplyr::mutate(wald_ratio_pval_adj = p.adjust(wald_ratio_pval, method = "BH")) |>
    dplyr::arrange(wald_ratio_pval_adj)
  return(out)
})

# Plot Wald ratios for all gene maks associated with an outcome
plot_wald_masks <- function(results_masks, outcome, pthresh){
  dat <- results_masks[[outcome]] |> 
    dplyr::filter(wald_ratio_pval_adj < pthresh) |>
    dplyr::arrange(wald_ratio_b) |>
    dplyr::mutate(
      mask_gene = toupper(mask_gene),
      mask_gene = factor(mask_gene, levels = mask_gene))

  p_wald <- ggplot(dat, aes(x = wald_ratio_b, y = mask_gene)) +
    geom_vline(xintercept = 0, colour = "grey20", lty = 2) +
    geom_point(colour = "steelblue4", size = 2) +
    geom_errorbarh(aes(xmin = wald_ratio_b-1.96*wald_ratio_se, xmax = wald_ratio_b+1.96*wald_ratio_se),
                   colour = "steelblue4",
                   height = 0.4) +
    labs(x = "Wald Ratio (By/Bx)", y = "Burden test gene \n (instrumenting cis-protein expression)", 
         title = paste("Outcome:", outcome),
         subtitle = "cis-IVs (pLOF and rare damaging)") +
    theme_bw() +
    theme(plot.title = element_text(hjust = 0.5),
          plot.subtitle = element_text(hjust = 0.5))
  
  return(p_wald)}

gridExtra::grid.arrange(grobs = lapply(names(res_masks_cis)[-1], plot_wald_masks, results_masks = res_masks_cis, pthresh = 0.1),
                        nrow = 2)
Figure 54

Cis vs trans gene-level mask associations

Code
res_masks_trans <- lapply(res_masks, function(x){
  out <- x |> 
    dplyr::filter(cis_trans == "trans",
                            mask == "ptvraredmg") |> 
    dplyr::mutate(wald_ratio_pval_adj = p.adjust(wald_ratio_pval, method = "BH")) |>
    dplyr::arrange(wald_ratio_pval_adj)
  return(out)
})

# Extract trans IVs for gene showing significant cis-instrumented causal effects
comp_masks_cistrans <- mapply(x = res_masks_cis, y = res_masks_trans, function(x,y){
  x <- x[,c("mask", "mask_gene", "exposure", "outcome", "wald_ratio_b", "wald_ratio_se", "wald_ratio_pval", "wald_ratio_pval_adj")]
  y <- y[,c("mask", "mask_gene", "exposure", "outcome", "wald_ratio_b", "wald_ratio_se", "wald_ratio_pval", "wald_ratio_pval_adj")]
  join_res <- x |> left_join(y, by = c("exposure", "outcome", "mask"), suffix = c("_cis","_trans"))
  return(join_res)
}, SIMPLIFY = F)

# Plot comparison of wald ratios for significant cis or trans IV associations
# lapply(comp_masks_cistrans, function(x){
#   dat <- x |> 
#     dplyr::filter(wald_ratio_pval_adj_cis < 0.1 | wald_ratio_pval_adj_trans < 0.1) |>
#     na.exclude()
#   
#   dat |> ggplot(aes(x = wald_ratio_b_cis, y = wald_ratio_b_trans)) +
#     geom_point()
# })

Intersting cases:

Rare vs common IV heterogeneity: - CFB_Hb1Ac, MOG_HbA1c, MPO_CorpVol, KLK15_LDL, CD14_SBP, SMPD1_SBP, CDHR5_VitD

Cis vs trans IV heterogeneity: - APOE_LDL, SBSN_VitD, CLSTN3_RBCcount, IGSF9_IGF1

Next steps

  • Look into examples where:

    • common variants give highly significant causal estimates and rare variants don’t and vice versa
  • differences between GWAS (regulatory) and WES (coding) common variant IVW estimates

  • differences between cis and trans instrumentation

  • Some case studies with a more thorough interrogation of where the instruments are sitting and the biology they are instrumenting

  • Look at estimates from the gene-based masks

  • Pleiotropy:

    • Compare the pleiotropy profiles of rare and common (and cis and trans) instruments
  • Rare regulatory variants (required WGS data)